Technical Architecture
Production-ready SaaS stack for ModelPilot AI โ a multi-tenant no-code AI chatbot platform. Optimized for solo founder velocity with clean abstractions that scale.
Tech Stack
What NOT to Build in MVP
6-Week Sprint Plan
Sequenced for a solo founder. Each week ships something live. Ship early, iterate fast.
Database Schema
PostgreSQL via Supabase. Multi-tenant using workspace_id on every table + Row Level Security (RLS). All timestamps in UTC.
Core Tables
Folder Structure
Monorepo with apps/web (Next.js) and apps/api (FastAPI). Shared types via packages/types.
# Root monorepo modelpilot/ โโโ apps/ โ โโโ web/ # Next.js 14 frontend โ โ โโโ app/ โ โ โ โโโ (auth)/ # Login, signup, onboarding โ โ โ โโโ (dashboard)/ # Authenticated app shell โ โ โ โ โโโ layout.tsx # Sidebar + topbar wrapper โ โ โ โ โโโ page.tsx # Dashboard โ โ โ โ โโโ chatbots/ โ โ โ โ โโโ knowledge/ โ โ โ โ โโโ providers/ โ โ โ โ โโโ logs/ โ โ โ โ โโโ widget/ โ โ โ โ โโโ team/ โ โ โ โ โโโ pricing/ โ โ โ โโโ api/ # Next.js API routes (thin proxies) โ โ โโโ components/ โ โ โ โโโ ui/ # Button, Input, Badge, Modal... โ โ โ โโโ chatbot/ # BotCard, BotEditor, ChatPreview โ โ โ โโโ knowledge/ # UploadZone, DocTable, FAQEditor โ โ โ โโโ widget/ # WidgetPreview, EmbedCode โ โ โโโ lib/ โ โ โ โโโ api.ts # Typed fetch wrapper โ โ โ โโโ supabase.ts # Supabase client โ โ โ โโโ hooks/ # useWorkspace, useChatbots, ... โ โ โโโ middleware.ts # Auth guard + workspace redirect โ โ โ โโโ api/ # FastAPI backend โ โโโ main.py # App entry, middleware, routers โ โโโ routers/ โ โ โโโ chat.py # POST /chat (SSE stream) โ โ โโโ chatbots.py # CRUD /chatbots โ โ โโโ knowledge.py # Upload, list, delete โ โ โโโ providers.py # API key management โ โ โโโ widget.py # Public widget endpoint โ โ โโโ analytics.py # Usage, cost, logs โ โ โโโ billing.py # Stripe webhooks โ โโโ services/ โ โ โโโ llm.py # LiteLLM wrapper โ โ โโโ rag.py # Qdrant search + context build โ โ โโโ ingestion.py # Chunking + embedding pipeline โ โ โโโ billing.py # Stripe + usage metering โ โ โโโ encryption.py # AES-256 for API keys โ โโโ workers/ โ โ โโโ tasks.py # Celery tasks (doc processing) โ โโโ models/ โ โ โโโ schemas.py # Pydantic request/response models โ โโโ middleware/ โ โ โโโ auth.py # JWT verify + tenant inject โ โ โโโ rate_limit.py # Redis sliding window โ โโโ db/ โ โโโ client.py # Supabase + asyncpg connection โ โโโ queries.py # Raw SQL helpers โ โโโ packages/ โ โโโ types/ # Shared TS types โ โโโ widget/ # Embeddable widget (vanilla JS) โ โโโ src/widget.ts โ โโโ dist/widget.js # Built, hosted on CDN โ โโโ rollup.config.js โ โโโ docker-compose.yml # Redis + Qdrant local dev โโโ .env.example
API Routes
REST API on FastAPI. Base URL: https://api.modelpilot.ai/v1. All routes require Authorization: Bearer <jwt> except widget endpoints.
Chatbots
Chat
Knowledge
Analytics
Chat Endpoint
SSE streaming endpoint. RAG context injected before LLM call. Token usage tracked in real-time and written async to DB.
FastAPI โ apps/api/routers/chat.py
from fastapi import APIRouter, Depends, HTTPException from fastapi.responses import StreamingResponse from pydantic import BaseModel from typing import AsyncIterator import json, time from ..middleware.auth import get_current_workspace from ..services.llm import stream_chat from ..services.rag import retrieve_context from ..db.client import db router = APIRouter(prefix="/chat", tags=["chat"]) class ChatRequest(BaseModel): chatbot_id: str session_id: str message: str history: list[dict] = [] # [{role, content}, ...] @router.post("") async def chat( req: ChatRequest, workspace = Depends(get_current_workspace) ): # 1. Load chatbot config bot = await db.fetchrow( "SELECT * FROM chatbots WHERE id=$1 AND workspace_id=$2", req.chatbot_id, workspace.id ) if not bot: raise HTTPException(404, "Chatbot not found") # 2. RAG: retrieve relevant context context_chunks = await retrieve_context( query=req.message, chatbot_id=req.chatbot_id, workspace_id=workspace.id, top_k=5 ) # 3. Build messages array system = bot["system_prompt"] if context_chunks: ctx_text = "\n\n".join(c["text"] for c in context_chunks) system += f"\n\n--- KNOWLEDGE BASE ---\n{ctx_text}" messages = [ {"role": "system", "content": system}, *req.history, {"role": "user", "content": req.message} ] # 4. Stream response start_ms = time.time() async def event_stream() -> AsyncIterator[str]: full_text = "" total_tokens = 0 cost_usd = 0.0 async for chunk in stream_chat( model=bot["model"], messages=messages, temperature=bot["temperature"], max_tokens=bot["max_tokens"], workspace_id=workspace.id ): if chunk.type == "text": full_text += chunk.text yield f"data: {json.dumps({'text': chunk.text})}\n\n" elif chunk.type == "usage": total_tokens = chunk.total_tokens cost_usd = chunk.cost_usd # 5. Persist async (don't block response) latency_ms = int((time.time() - start_ms) * 1000) await db.execute(""" INSERT INTO messages (conversation_id, role, content, tokens, sources, latency_ms) VALUES ((SELECT id FROM conversations WHERE session_id=$1 LIMIT 1), 'assistant', $2, $3, $4, $5) """, req.session_id, full_text, total_tokens, json.dumps(context_chunks), latency_ms) await db.execute(""" INSERT INTO usage_events (workspace_id, event_type, tokens_used, cost_usd, model) VALUES ($1, 'chat_message', $2, $3, $4) """, workspace.id, total_tokens, cost_usd, bot["model"]) yield f"data: {json.dumps({'done': True, 'tokens': total_tokens, 'cost': cost_usd})}\n\n" return StreamingResponse( event_stream(), media_type="text/event-stream", headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"} )
LiteLLM Service โ services/llm.py
import litellm from .encryption import decrypt_key from ..db.client import db # Cost per 1M tokens (input, output) MODEL_COSTS = { "gpt-4o": (5.00, 15.00), "gpt-4o-mini": (0.15, 0.60), "claude-3-5-sonnet": (3.00, 15.00), "claude-3-haiku": (0.25, 1.25), "gemini-1.5-pro": (3.50, 10.50), "gemini-flash": (0.075, 0.30), } async def stream_chat(model, messages, temperature, max_tokens, workspace_id): # Fetch decrypted API key for this workspace + provider provider = model.split("-")[0] # 'gpt' โ 'openai', 'claude' โ 'anthropic' provider_map = {"gpt": "openai", "claude": "anthropic", "gemini": "google"} provider_name = provider_map.get(provider, provider) row = await db.fetchrow( "SELECT api_key_enc FROM ai_providers WHERE workspace_id=$1 AND provider=$2", workspace_id, provider_name ) api_key = decrypt_key(row["api_key_enc"]) response = await litellm.acompletion( model=model, messages=messages, temperature=temperature, max_tokens=max_tokens, api_key=api_key, stream=True ) total_in = total_out = 0 async for chunk in response: delta = chunk.choices[0].delta if delta.content: yield type("C", (), {"type": "text", "text": delta.content})() if hasattr(chunk, "usage") and chunk.usage: total_in = chunk.usage.prompt_tokens total_out = chunk.usage.completion_tokens ci, co = MODEL_COSTS.get(model, (5, 15)) cost = (total_in * ci + total_out * co) / 1_000_000 yield type("U", (), {"type": "usage", "total_tokens": total_in + total_out, "cost_usd": cost})()
Next.js โ Consuming the stream (TypeScript)
// components/chatbot/ChatPreview.tsx export async function sendMessage( chatbotId: string, message: string, history: Message[], onChunk: (text: string) => void, onDone: (usage: Usage) => void ) { const res = await fetch(`${API_URL}/chat`, { method: "POST", headers: { "Content-Type": "application/json", "Authorization": `Bearer ${await getAccessToken()}`, }, body: JSON.stringify({ chatbot_id: chatbotId, session_id: getSessionId(), message, history }), }); const reader = res.body!.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const lines = decoder.decode(value).split("\n").filter(l => l.startsWith("data: ")); for (const line of lines) { const payload = JSON.parse(line.slice(6)); if (payload.text) onChunk(payload.text); if (payload.done) onDone({ tokens: payload.tokens, cost: payload.cost }); } } }
RAG Pipeline
Document ingestion โ chunking โ embedding โ Qdrant storage. Retrieval at chat time injects top-K relevant chunks into the system prompt.
Ingestion Worker โ workers/tasks.py
from celery import Celery from qdrant_client import QdrantClient from qdrant_client.models import PointStruct, Distance, VectorParams from openai import AsyncOpenAI import pdfplumber, re, uuid app = Celery("tasks", broker="redis://localhost:6379") qdrant = QdrantClient(url="https://your-qdrant.cloud", api_key="...") openai = AsyncOpenAI() def chunk_text(text: str, chunk_size=256, overlap=32) -> list[str]: """Split text into overlapping token-approximate chunks.""" words = text.split() chunks, i = [], 0 while i < len(words): chunk = " ".join(words[i : i + chunk_size]) chunks.append(chunk) i += chunk_size - overlap return [c for c in chunks if len(c.strip()) > 20] async def embed_texts(texts: list[str]) -> list[list[float]]: res = await openai.embeddings.create( model="text-embedding-3-small", input=texts ) return [e.embedding for e in res.data] def ensure_collection(workspace_id: str): col = f"ws_{workspace_id.replace('-','_')}" if col not in [c.name for c in qdrant.get_collections().collections]: qdrant.create_collection(col, vectors_config=VectorParams( size=1536, distance=Distance.COSINE )) return col @app.task(bind=True, max_retries=3) def ingest_document(self, doc_id: str, workspace_id: str, chatbot_id: str, file_path: str, source_type: str): import asyncio asyncio.run(_ingest(doc_id, workspace_id, chatbot_id, file_path, source_type)) async def _ingest(doc_id, workspace_id, chatbot_id, file_path, source_type): from ..db.client import db await db.execute( "UPDATE knowledge_documents SET status='processing' WHERE id=$1", doc_id ) try: # 1. Extract text if source_type == "pdf": with pdfplumber.open(file_path) as pdf: text = "\n".join(p.extract_text() or "" for p in pdf.pages) elif source_type == "url": import trafilatura downloaded = trafilatura.fetch_url(file_path) text = trafilatura.extract(downloaded) or "" else: text = open(file_path).read() # 2. Chunk chunks = chunk_text(re.sub(r'\s+', ' ', text)) # 3. Embed (batch of 100) all_embeddings = [] for i in range(0, len(chunks), 100): batch = await embed_texts(chunks[i:i+100]) all_embeddings.extend(batch) # 4. Upsert to Qdrant col = ensure_collection(workspace_id) points = [ PointStruct( id=str(uuid.uuid4()), vector=emb, payload={ "text": chunk, "doc_id": doc_id, "chatbot_id": chatbot_id, "workspace_id": workspace_id, "chunk_index": i, } ) for i, (chunk, emb) in enumerate(zip(chunks, all_embeddings)) ] qdrant.upsert(collection_name=col, points=points) await db.execute( "UPDATE knowledge_documents SET status='indexed', chunk_count=$1 WHERE id=$2", len(chunks), doc_id ) except Exception as e: await db.execute( "UPDATE knowledge_documents SET status='error', error_msg=$1 WHERE id=$2", str(e), doc_id )
Retrieval โ services/rag.py
from qdrant_client import QdrantClient from qdrant_client.models import Filter, FieldCondition, MatchValue from openai import AsyncOpenAI qdrant = QdrantClient(url="https://your-qdrant.cloud", api_key="...") openai = AsyncOpenAI() async def retrieve_context( query: str, chatbot_id: str, workspace_id: str, top_k: int = 5, score_threshold: float = 0.72 ) -> list[dict]: # Embed the query res = await openai.embeddings.create( model="text-embedding-3-small", input=query ) q_vector = res.data[0].embedding col = f"ws_{workspace_id.replace('-','_')}" # Search with chatbot_id filter results = qdrant.search( collection_name=col, query_vector=q_vector, limit=top_k, score_threshold=score_threshold, query_filter=Filter(must=[ FieldCondition(key="chatbot_id", match=MatchValue(value=chatbot_id)) ]) ) return [ {"text": r.payload["text"], "score": round(r.score, 3), "doc_id": r.payload["doc_id"]} for r in results ]
Embed Widget
Vanilla JS, zero dependencies, ~6KB gzipped. Injected via a single <script> tag. Self-contained shadow DOM to prevent CSS leakage.
// widget/src/widget.ts โ compiled to widget/dist/widget.js (function() { const config = window.ModelPilotConfig || {}; const BOT_ID = config.botId || document.currentScript.dataset.botId; const API = "https://api.modelpilot.ai/v1"; let sessionId = localStorage.getItem("mp_session"); if (!sessionId) { sessionId = crypto.randomUUID(); localStorage.setItem("mp_session", sessionId); } // Fetch widget config from API async function init() { const res = await fetch(`${API}/widget/${BOT_ID}/config`); const cfg = await res.json(); render(cfg); } function render(cfg) { const host = document.createElement("div"); const shadow = host.attachShadow({ mode: "closed" }); document.body.appendChild(host); shadow.innerHTML = ` <style> :host { all: initial; font-family: system-ui; } #launcher { position: fixed; ${cfg.position === "bottom-left" ? "left" : "right"}: 20px; bottom: 20px; width: 52px; height: 52px; border-radius: 50%; background: ${cfg.accentColor}; cursor: pointer; display: flex; align-items: center; justify-content: center; font-size: 24px; box-shadow: 0 4px 20px rgba(0,0,0,0.18); z-index: 999999; border: none; transition: transform .2s; } #launcher:hover { transform: scale(1.08); } #window { position: fixed; ${cfg.position === "bottom-left" ? "left" : "right"}: 20px; bottom: 82px; width: 360px; height: 560px; border-radius: 18px; background: #fff; box-shadow: 0 12px 48px rgba(0,0,0,0.18); display: none; flex-direction: column; overflow: hidden; z-index: 999998; } #window.open { display: flex; } #header { background: ${cfg.accentColor}; padding: 14px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 10px; } #messages { flex: 1; overflow-y: auto; padding: 14px; display: flex; flex-direction: column; gap: 10px; background: #f7f8fc; } .msg { max-width: 82%; padding: 10px 14px; border-radius: 12px; font-size: 13.5px; line-height: 1.5; } .user { align-self: flex-end; background: ${cfg.accentColor}; color: white; border-radius: 12px 12px 2px 12px; } .bot { align-self: flex-start; background: white; color: #09090b; border-radius: 12px 12px 12px 2px; box-shadow: 0 1px 4px rgba(0,0,0,.08); } #input-row { display: flex; gap: 8px; padding: 10px; border-top: 1px solid #f0f0f0; } #input { flex: 1; border: 1px solid #e4e4e7; border-radius: 9px; padding: 8px 12px; font-size: 13px; outline: none; } #send { background: ${cfg.accentColor}; color: white; border: none; border-radius: 9px; padding: 8px 14px; cursor: pointer; font-weight: 700; } </style> <button id="launcher">๐ฌ</button> <div id="window"> <div id="header">๐ค ${cfg.botName || "Assistant"}</div> <div id="messages"> <div class="msg bot">${cfg.greeting || "Hi! How can I help?"}</div> </div> <div id="input-row"> <input id="input" placeholder="Type a messageโฆ" /> <button id="send">โ</button> </div> </div>`; const launcher = shadow.getElementById("launcher"); const win = shadow.getElementById("window"); const msgs = shadow.getElementById("messages"); const input = shadow.getElementById("input"); const send = shadow.getElementById("send"); let history = []; launcher.onclick = () => win.classList.toggle("open"); async function sendMessage() { const text = input.value.trim(); if (!text) return; input.value = ""; addMsg("user", text); history.push({ role: "user", content: text }); const botEl = addMsg("bot", "โ"); // streaming cursor let full = ""; const res = await fetch(`${API}/widget/${BOT_ID}/chat`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ session_id: sessionId, message: text, history }) }); const reader = res.body.getReader(); const dec = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; dec.decode(value).split("\n") .filter(l => l.startsWith("data: ")) .forEach(l => { const p = JSON.parse(l.slice(6)); if (p.text) { full += p.text; botEl.textContent = full + "โ"; } if (p.done) { botEl.textContent = full; } }); msgs.scrollTop = msgs.scrollHeight; } history.push({ role: "assistant", content: full }); } function addMsg(role, text) { const el = document.createElement("div"); el.className = "msg " + role; el.textContent = text; msgs.appendChild(el); msgs.scrollTop = msgs.scrollHeight; return el; } send.onclick = sendMessage; input.onkeydown = e => e.key === "Enter" && sendMessage(); } init(); })();
Auth & Multi-tenancy
Supabase handles JWT issuance and OAuth. FastAPI verifies JWTs and injects workspace context. Row Level Security on every table enforces isolation at the DB layer.
FastAPI JWT middleware โ middleware/auth.py
from fastapi import Header, HTTPException, Depends from jose import jwt, JWTError import os from ..db.client import db SUPABASE_JWT_SECRET = os.environ["SUPABASE_JWT_SECRET"] class WorkspaceCtx: user_id: str workspace_id: str role: str async def get_current_workspace( authorization: str = Header(...) ) -> WorkspaceCtx: try: token = authorization.removeprefix("Bearer ") payload = jwt.decode(token, SUPABASE_JWT_SECRET, algorithms=["HS256"]) user_id = payload["sub"] except JWTError: raise HTTPException(401, "Invalid token") # Load workspace membership (cached in Redis 60s) row = await db.fetchrow(""" SELECT wm.workspace_id, wm.role, w.plan, w.message_quota FROM workspace_members wm JOIN workspaces w ON w.id = wm.workspace_id WHERE wm.user_id = $1 AND wm.joined_at IS NOT NULL ORDER BY wm.joined_at LIMIT 1 """, user_id) if not row: raise HTTPException(403, "No workspace found") ctx = WorkspaceCtx() ctx.user_id = user_id ctx.workspace_id = str(row["workspace_id"]) ctx.role = row["role"] ctx.plan = row["plan"] ctx.message_quota = row["message_quota"] return ctx
Supabase RLS Policies โ SQL
-- Enable RLS on all tables ALTER TABLE chatbots ENABLE ROW LEVEL SECURITY; ALTER TABLE knowledge_documents ENABLE ROW LEVEL SECURITY; ALTER TABLE conversations ENABLE ROW LEVEL SECURITY; ALTER TABLE messages ENABLE ROW LEVEL SECURITY; ALTER TABLE usage_events ENABLE ROW LEVEL SECURITY; -- Helper function: get user's workspace IDs CREATE OR REPLACE FUNCTION auth.workspace_ids() RETURNS uuid[] LANGUAGE sql STABLE AS $$ SELECT array_agg(workspace_id) FROM workspace_members WHERE user_id = auth.uid() AND joined_at IS NOT NULL; $$; -- Chatbots: members can read, editors/admins can write CREATE POLICY chatbots_select ON chatbots FOR SELECT USING (workspace_id = ANY(auth.workspace_ids())); CREATE POLICY chatbots_insert ON chatbots FOR INSERT WITH CHECK ( workspace_id IN ( SELECT workspace_id FROM workspace_members WHERE user_id = auth.uid() AND role IN ('editor','admin') ) ); CREATE POLICY chatbots_update ON chatbots FOR UPDATE USING ( workspace_id IN ( SELECT workspace_id FROM workspace_members WHERE user_id = auth.uid() AND role IN ('editor','admin') ) ); -- Conversations: members can view only their workspace CREATE POLICY conversations_select ON conversations FOR SELECT USING (workspace_id = ANY(auth.workspace_ids())); -- Admins only: billing + provider keys CREATE POLICY providers_admin ON ai_providers FOR ALL USING ( workspace_id IN ( SELECT workspace_id FROM workspace_members WHERE user_id = auth.uid() AND role = 'admin' ) );
Next.js Auth middleware โ middleware.ts
import { createMiddlewareClient } from "@supabase/auth-helpers-nextjs"; import { NextResponse } from "next/server"; import type { NextRequest } from "next/server"; export async function middleware(req: NextRequest) { const res = NextResponse.next(); const supabase = createMiddlewareClient({ req, res }); const { data: { session } } = await supabase.auth.getSession(); const isAuthPage = req.nextUrl.pathname.startsWith("/login"); if (!session && !isAuthPage) { return NextResponse.redirect(new URL("/login", req.url)); } if (session && isAuthPage) { return NextResponse.redirect(new URL("/", req.url)); } return res; } export const config = { matcher: ["/((?!_next/static|_next/image|favicon|widget.js).*)"`] };
Deployment
Frontend on Vercel, backend on Railway. Widget JS hosted on Cloudflare R2 for <50ms global delivery. All services deploy on git push.
docker-compose.yml (Local Dev)
version: "3.9" services: redis: image: redis:7-alpine ports: ["6379:6379"] qdrant: image: qdrant/qdrant:latest ports: ["6333:6333"] volumes: ["./qdrant_storage:/qdrant/storage"] api: build: ./apps/api ports: ["8000:8000"] env_file: .env depends_on: [redis, qdrant] command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload worker: build: ./apps/api env_file: .env depends_on: [redis] command: celery -A workers.tasks worker --loglevel=info
Security Checklist
Environment Variables
# .env.example # Supabase SUPABASE_URL=https://xxxx.supabase.co SUPABASE_ANON_KEY=eyJ... SUPABASE_SERVICE_KEY=eyJ... # Never expose to client SUPABASE_JWT_SECRET=your-jwt-secret # Qdrant QDRANT_URL=https://xxxx.qdrant.io QDRANT_API_KEY=... # OpenAI (for embeddings only) OPENAI_API_KEY=sk-... # Encryption key for stored provider keys (32 bytes) ENCRYPTION_KEY=base64-encoded-32-byte-key # Redis REDIS_URL=redis://localhost:6379 # Stripe STRIPE_SECRET_KEY=sk_live_... STRIPE_WEBHOOK_SECRET=whsec_... STRIPE_PRICE_ID_STARTER=price_... STRIPE_PRICE_ID_PRO=price_... # App NEXT_PUBLIC_API_URL=https://api.modelpilot.ai/v1 NEXT_PUBLIC_SUPABASE_URL=https://xxxx.supabase.co NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJ...